Semantic-Guided Selective Representation for Image Captioning

نویسندگان

چکیده

Grid-based features have been proven to be as effective region-based in multi-modal tasks such visual question answering. However, its application image captioning encounters two main issues, namely, noisy and fragmented semantics. In this paper, we propose a novel feature selection scheme, with Relation-Aware Selection (RAS) Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance salient regions channels, suppress less important ones. addition, process is guided by FSG, which uses fine-grained semantic knowledge supervise process. Experimental results MS COCO show proposed RAS-FSG scheme achieves state-of-the-art performance both off-line on-line testing, i.e., 134.3 CIDEr for testing 135.4 of MSCOCO. Extensive ablation studies visualizations also validate effectiveness our scheme.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-Guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...

متن کامل

A Distributed Representation Based Query Expansion Approach for Image Captioning

In Figure 1, we present more example results obtained with our approach on the benchmark datasets Flickr8K (Hodosh et al., 2013), Flickr30K (Young et al., 2014), MS COCO (Lin et al., 2014). We also provide groundtruth human descriptions for comparison. There are some cases where our approach falls short. In some of those cases, although the system does not produce the most desirable results, it...

متن کامل

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-tr...

متن کامل

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...

متن کامل

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3243952